543 research outputs found

    Incremental Algorithms for Effective and Efficient Query Recommendation

    Full text link
    Abstract. Query recommender systems give users hints on possible in-teresting queries relative to their information needs. Most query rec-ommenders are based on static knowledge models built on the basis of past user behaviors recorded in query logs. These models should be pe-riodically updated, or rebuilt from scratch, to keep up with the possible variations in the interests of users. We study query recommender algo-rithms that generate suggestions on the basis of models that are updated continuously, each time a new query is submitted. We extend two state-of-the-art query recommendation algorithms and evaluate the effects of continuous model updates on their effectiveness and efficiency. Tests con-ducted on an actual query log show that contrasting model aging by con-tinuously updating the recommendation model is a viable and effective solution.

    Ranking and clustering of nodes in networks with smart teleportation

    Get PDF
    Random teleportation is a necessary evil for ranking and clustering directed networks based on random walks. Teleportation enables ergodic solutions, but the solutions must necessarily depend on the exact implementation and parametrization of the teleportation. For example, in the commonly used PageRank algorithm, the teleportation rate must trade off a heavily biased solution with a uniform solution. Here we show that teleportation to links rather than nodes enables a much smoother trade-off and effectively more robust results. We also show that, by not recording the teleportation steps of the random walker, we can further reduce the effect of teleportation with dramatic effects on clustering.Comment: 10 pages, 7 figure

    TimeMachine: Timeline Generation for Knowledge-Base Entities

    Full text link
    We present a method called TIMEMACHINE to generate a timeline of events and relations for entities in a knowledge base. For example for an actor, such a timeline should show the most important professional and personal milestones and relationships such as works, awards, collaborations, and family relationships. We develop three orthogonal timeline quality criteria that an ideal timeline should satisfy: (1) it shows events that are relevant to the entity; (2) it shows events that are temporally diverse, so they distribute along the time axis, avoiding visual crowding and allowing for easy user interaction, such as zooming in and out; and (3) it shows events that are content diverse, so they contain many different types of events (e.g., for an actor, it should show movies and marriages and awards, not just movies). We present an algorithm to generate such timelines for a given time period and screen size, based on submodular optimization and web-co-occurrence statistics with provable performance guarantees. A series of user studies using Mechanical Turk shows that all three quality criteria are crucial to produce quality timelines and that our algorithm significantly outperforms various baseline and state-of-the-art methods.Comment: To appear at ACM SIGKDD KDD'15. 12pp, 7 fig. With appendix. Demo and other info available at http://cs.stanford.edu/~althoff/timemachine

    Fast Searching in Packed Strings

    Get PDF
    Given strings PP and QQ the (exact) string matching problem is to find all positions of substrings in QQ matching PP. The classical Knuth-Morris-Pratt algorithm [SIAM J. Comput., 1977] solves the string matching problem in linear time which is optimal if we can only read one character at the time. However, most strings are stored in a computer in a packed representation with several characters in a single word, giving us the opportunity to read multiple characters simultaneously. In this paper we study the worst-case complexity of string matching on strings given in packed representation. Let mnm \leq n be the lengths PP and QQ, respectively, and let σ\sigma denote the size of the alphabet. On a standard unit-cost word-RAM with logarithmic word size we present an algorithm using time O\left(\frac{n}{\log_\sigma n} + m + \occ\right). Here \occ is the number of occurrences of PP in QQ. For m=o(n)m = o(n) this improves the O(n)O(n) bound of the Knuth-Morris-Pratt algorithm. Furthermore, if m=O(n/logσn)m = O(n/\log_\sigma n) our algorithm is optimal since any algorithm must spend at least \Omega(\frac{(n+m)\log \sigma}{\log n} + \occ) = \Omega(\frac{n}{\log_\sigma n} + \occ) time to read the input and report all occurrences. The result is obtained by a novel automaton construction based on the Knuth-Morris-Pratt algorithm combined with a new compact representation of subautomata allowing an optimal tabulation-based simulation.Comment: To appear in Journal of Discrete Algorithms. Special Issue on CPM 200

    Revisiting the Problem of Searching on a Line

    Get PDF
    We revisit the problem of searching for a target at an unknown location on a line when given upper and lower bounds on the distance D that separates the initial position of the searcher from the target. Prior to this work, only asymptotic bounds were known for the optimal competitive ratio achievable by any search strategy in the worst case. We present the first tight bounds on the exact optimal competitive ratio achievable, parameterized in terms of the given bounds on D, along with an optimal search strategy that achieves this competitive ratio. We prove that this optimal strategy is unique. We characterize the conditions under which an optimal strategy can be computed exactly and, when it cannot, we explain how numerical methods can be used efficiently. In addition, we answer several related open questions, including the maximal reach problem, and we discuss how to generalize these results to m rays, for any m >= 2

    Data Portraits and Intermediary Topics: Encouraging Exploration of Politically Diverse Profiles

    Full text link
    In micro-blogging platforms, people connect and interact with others. However, due to cognitive biases, they tend to interact with like-minded people and read agreeable information only. Many efforts to make people connect with those who think differently have not worked well. In this paper, we hypothesize, first, that previous approaches have not worked because they have been direct -- they have tried to explicitly connect people with those having opposing views on sensitive issues. Second, that neither recommendation or presentation of information by themselves are enough to encourage behavioral change. We propose a platform that mixes a recommender algorithm and a visualization-based user interface to explore recommendations. It recommends politically diverse profiles in terms of distance of latent topics, and displays those recommendations in a visual representation of each user's personal content. We performed an "in the wild" evaluation of this platform, and found that people explored more recommendations when using a biased algorithm instead of ours. In line with our hypothesis, we also found that the mixture of our recommender algorithm and our user interface, allowed politically interested users to exhibit an unbiased exploration of the recommended profiles. Finally, our results contribute insights in two aspects: first, which individual differences are important when designing platforms aimed at behavioral change; and second, which algorithms and user interfaces should be mixed to help users avoid cognitive mechanisms that lead to biased behavior.Comment: 12 pages, 7 figures. To be presented at ACM Intelligent User Interfaces 201

    Predicting your next OLAP query based on recent analytical sessions

    Get PDF
    International audienceIn Business Intelligence systems, users interact with data warehouses by formulating OLAP queries aimed at exploring multidimensional data cubes. Being able to predict the most likely next queries would provide a way to recommend interesting queries to users on the one hand, and could improve the efficiency of OLAP sessions on the other. In particular, query recommendation would proactively guide users in data exploration and improve the quality of their interactive experience. In this paper, we propose a framework to predict the most likely next query and recommend this to the user. Our framework relies on a probabilistic user behavior model built by analyzing previous OLAP sessions and exploiting a query similarity metric. To gain insight in the recommendation precision and on what parameters it depends, we evaluate our approach using different quality assessments

    Exploiting the Social Capital of Folksonomies for Web Page Classification

    Full text link

    Dynamic Set Intersection

    Full text link
    Consider the problem of maintaining a family FF of dynamic sets subject to insertions, deletions, and set-intersection reporting queries: given S,SFS,S'\in F, report every member of SSS\cap S' in any order. We show that in the word RAM model, where ww is the word size, given a cap dd on the maximum size of any set, we can support set intersection queries in O(dw/log2w)O(\frac{d}{w/\log^2 w}) expected time, and updates in O(logw)O(\log w) expected time. Using this algorithm we can list all tt triangles of a graph G=(V,E)G=(V,E) in O(m+mαw/log2w+t)O(m+\frac{m\alpha}{w/\log^2 w} +t) expected time, where m=Em=|E| and α\alpha is the arboricity of GG. This improves a 30-year old triangle enumeration algorithm of Chiba and Nishizeki running in O(mα)O(m \alpha) time. We provide an incremental data structure on FF that supports intersection {\em witness} queries, where we only need to find {\em one} eSSe\in S\cap S'. Both queries and insertions take O\paren{\sqrt \frac{N}{w/\log^2 w}} expected time, where N=SFSN=\sum_{S\in F} |S|. Finally, we provide time/space tradeoffs for the fully dynamic set intersection reporting problem. Using MM words of space, each update costs O(MlogN)O(\sqrt {M \log N}) expected time, each reporting query costs O(NlogNMop+1)O(\frac{N\sqrt{\log N}}{\sqrt M}\sqrt{op+1}) expected time where opop is the size of the output, and each witness query costs O(NlogNM+logN)O(\frac{N\sqrt{\log N}}{\sqrt M} + \log N) expected time.Comment: Accepted to WADS 201

    Expected length of the longest common subsequence for large alphabets

    Full text link
    We consider the length L of the longest common subsequence of two randomly uniformly and independently chosen n character words over a k-ary alphabet. Subadditivity arguments yield that the expected value of L, when normalized by n, converges to a constant C_k. We prove a conjecture of Sankoff and Mainville from the early 80's claiming that C_k\sqrt{k} goes to 2 as k goes to infinity.Comment: 14 pages, 1 figure, LaTe
    corecore